Analysis of a Contour-based Representation for Melody

نویسندگان

  • Youngmoo E. Kim
  • Wei Chai
  • Ricardo García
  • Barry Vercoe
چکیده

Identifying a musical work from a melodic fragment is a task that most people are able to accomplish with relative ease. For some time now researchers have worked to give computers this ability as well, as it would be the cornerstone of any query-by-humming system. To accomplish this, it is reasonable to study how humans are able to perform this task, and to assess what features we use to determine melodic similarity. Research has shown that melodic contour is an important feature in determining melodic similarity, but it is also clear that rhythmic information is important as well. The goal of this research is to explore what variation of contour and rhythmic information can result in the most efficient, robust, and scalable representation for melody. We intend for this to be the basis of a query-by-humming system that will be used to test the validity of our proposed representation. The importance of melodic contour The literature suggests that a coarse melodic contour description is more important to listeners than strict intervals in determining melodic similarity. Experiments have shown that interval direction alone (i.e. the 3-level +/-/0 contour representation) is an important element of melody recognition. There is, of course, anecdotal and experimental evidence that humans use more than just interval direction (a 3-level contour) in assessing melodic similarity. In an experiment by Lindsay (1996), subjects were asked to repeat (sing) a melody that was played for them. He found that while there was some correlation between sung interval accuracy and musical experience, even musically inexperienced subjects were able to negotiate different interval sizes fairly successfully. From a practical standpoint, a 3-level representation will generally require longer queries to arrive at a unique match. Given the perceptual and practical considerations, we chose to explore finer (5and 7-level) contour divisions for our representation. Proposed melody representation We used a triple to represent each melody, where T is the time signature of the song, P is the pitch contour vector, and B is the beat number vector. The range of values of P vary depending on the number of levels of contour used, but follow the pattern of 0, +, -, ++, --, +++, etc. The first value of B is the location of the first note within its measure in beats (according to the time signature). Successive values of B are incremented according to the number of beats between successive notes. Values of B are quantized to the nearest whole beat. Additionally, we used a vector Q to represent different contour resolutions and quantization boundaries. The length of Q indirectly reveals the number of levels of contour being used, and the individual values of Q indicate the absolute value of the quantization boundaries (in number of half-steps). For example, Q = [0 1] represents that we quantize interval changes into three levels, 0 for no change, + for an ascending interval (a boundary at one half-step or more), and for a descending interval. This representation is equivalent to the popular +/-/0 or U/D/R (up/down/repeat) representation. Q = [0 1 3] represents a quantization of intervals into five levels, 0 for no change, + for an ascending half-step or whole-step (1 or 2 half-steps), ++ for ascending at least a minor third (3 or more half-steps), for a descending half-step or whole-step, and -for a descent of at least a minor third. Thus far, we have assembled a data set of 50 multi-track MIDI files, containing a mixture of popular and classical music. The popular music selections span a variety of different countries. All selected songs had a separate monophonic melody sound track.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity matching of continuous melody contours for humming querying of melody databases

Music query-by-humming is a challenging problem since the humming query inevitably contains much variation and inaccuracy. Many of the previous methods, which adopt note segmentation and string matching with dynamic programming, suffer drastically from the errors in the note segmentation which affects retrieval accuracy and efficiency. In this paper, we present a novel melody similarity matchin...

متن کامل

Was Parsons right? An experiment in usability of music representations for melody-based music retrieval

In 1975 Parsons developed his dictionary of musical themes based on a simple contour representation. The motivation was that people with little training in music would be able to identify pieces of music. We decided to test whether people of various levels of musical skill could indeed make use of a text representation to describe a simple melody query. The results indicate that the task is bey...

متن کامل

A Robust Music Retrieval Method for Query- by-Humming

In this paper, we present a novel melody representation and matching method, which is both robust against pitch errors and invariant to liner or non-linear tempo variation. The melody of a music item or a query is represented by a point sequence, which is derived from the pitch contour of the melody. This point sequence is invariant to the time or speed in the original melody contour. Important...

متن کامل

Applications of a Semi-automatic Melody Extraction Interface for Indian Music

Automatic extraction of the melody from polyphonic music recordings is a challenging problem for which no general solutions currently exist. We present a novel interface for semi-automatic melody extraction with the goal to provide highly accurate pitch tracks of the lead voice with minimal user intervention. Audio-visual feedback facilitates the validation of the obtained melodic contour, and ...

متن کامل

Schema-based processing in auditory scene analysis.

What is the involvement of what we know in what we perceive? In this article, the contribution of melodic schema-based processes to the perceptual organization of tone sequences is examined. Two unfamiliar six-tone melodies, one of which was interleaved with distractor tones, were presented successively to listeners who were required to decide whether the melodies were identical or different. I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000